A New Forecasting Approach for Oil Price Using the Recursive Decomposition–Reconstruction–Ensemble Method with Complexity Traits

The subject of oil price forecasting has obtained an incredible amount of interest from academics and policymakers in recent years due to the widespread impact that it has on various economic fields and markets. Thus, a novel method based on decomposition–reconstruction–ensemble for crude oil price forecasting is proposed. Based on the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN) technique, in this paper we construct a recursive CEEMDAN decomposition–reconstruction–ensemble model considering the complexity traits of crude oil data. In this model, the steps of mode reconstruction, component prediction, and ensemble prediction are driven by complexity traits. For illustration and verification purposes, the West Texas Intermediate (WTI) and Brent crude oil spot prices are used as the sample data. The empirical result demonstrates that the proposed model has better prediction performance than the benchmark models. Thus, the proposed recursive CEEMDAN decomposition–reconstruction–ensemble model can be an effective tool to forecast oil price in the future.


Introduction
Crude oil, which is the world's most important chemical raw material and strategic resource, ensures the normal operation of the national economy and people's livelihoods, and it is a critical support for the development of the entire modern industrial society. Crude oil plays an important role in the global economy, political situation, and military strength of various countries as a basic energy source. As a result, changes in crude oil prices have sparked widespread concern worldwide. Because of the interactive impact of various factors such as the global economy, exchange rate changes, speculative behavior, and geopolitics, the oil price always exhibits non-linearity, non-stationarity, and high complexity, which poses significant challenges to crude oil price forecasting.
In the literature, various linear and nonlinear models have been used separately or in combination to make forecast (see, e.g., Buyuksahin & Ertekin [1]). Linear methods assume that a given time series is regular with no sudden movements. It becomes challenging because sudden movements with variation and extreme values are normal in many realworld time series such as financial data and renewable energy data (see, e.g., Xu et al. [2]). Numerous nonlinear time series prediction methods (see, e.g., Kantz & Schreiber [3]) have been proposed in the literature to capture these nonlinearities. Conventional linear methods can better approximate time series with no high volatility and multicollinearity.
Zhang et al. [4] and Elman [5] show that nonlinear methods have the advantages when modeling a complex structure in time series with high accuracy. No universal model is suitable for all circumstances because each type of method outperforms others in different domains. Individually capturing general patterns in the time series data using only one linear or nonlinear model appears to be difficult (see, e.g., Khashei & Bijari [6]). To overcome this limitation, Taskaya & Casey [7] proposed hybrid techniques with both linear and nonlinear models. The hybrid methodology is a synthesis of various prediction methods. It is usually a combination of traditional econometric models and AI algorithms (see, e.g., Wang et al. [8]) or a combination of different econometric models or AI algorithms.
In addition to the hybrid methodology, the ensemble learning algorithm is an important paradigm to overcome the limitations of single methods. Both hybrid methodology and the ensemble method consider the shortcomings of single models. With the divide-andconquer strategy (see, e.g., Yu et al. [9] and Dong et al. [10]), the decomposition-ensemble learning methods are an important branch of ensemble learning paradigms. Because it will take a lot of time to make individual prediction from all decomposed components, the number of decomposed components is necessarily reduced. Yu et al. [11] first proposed a decomposition-ensemble model with a reconstruction step that considered some data characteristics. Recently, Yu & Ma [12] introduced a memory-trait-driven reconstruction method into the decomposition and ensemble framework. Inspired by their work, a new model based on decomposition-ensemble learning with a reconstruction step that considers the data complexity traits is used to explore the price predictions of crude oil. In this model, all steps of mode reconstruction, component prediction, and ensemble prediction are driven by complexity traits. First, a decomposition-ensemble approach is used to decompose the oil price time series. Second, the complexity of these decomposed components are separately computed. Then, each component can be identified based on its complexity ranking from high to low. Different components are predicted through appropriated models. Finally, the forecasting for different components can be aggregated to produce the final prediction output. The contributions of the article are as follows: i.
A novel decomposition-reconstruction-ensemble method is proposed with clustering capability to capture the inner complexity traits. The performance of the proposed recursive CEEMDAN for different complexity traits of data is tested and validated using popular single models and several decomposition-reconstruction-ensemble models. ii. The proposed recursive CEEMDAN technique is used to improve the performance of the CEEMDAN decomposition method by recursively decomposing the rapidly fluctuating components into less volatile sub-components. iii. In the proposed recursive CEEMDAN decomposition-reconstruction-ensemble forecasting methodology, the reconstruction method, prediction method, and ensemble method are determined by the complexity traits of the crude oil data themselves.
The remainder of this paper is organized as follows. Section 2 considers a comparison to the related works. Research data and the decomposition-reconstruction-ensemble method are discussed in Section 3. Section 4 presents the error measures to validate the prediction models. Some main findings are illustrated by comparing the results of the proposed model to the benchmark models. The prediction performance of the proposed model is further discussed in Section 5. Section 6 summarizes this paper and provides the improvement direction of future research.

Forecasting by Statistical Models
Statistical models, which are also known as random time series models, include exponential smoothing (ES) (see, e.g., Kourentzes et al. [13]), auto-regressive integrated moving average (ARIMA) model (see, e.g., Guo [14]), generalized auto-regressive conditional heteroskedasticity (GARCH) model (see, e.g., Zhang et al. [15]), hidden Markov model (HMM) (see, e.g., Isah & Bon [16]), and vectorial auto-regression (VAR) (see, e.g., Mirmirani & Li [17]). For example, Zolfaghari & Gholami [18] showed that ARIMA models had a good forecasting impact on international crude oil prices. To modify the mean and variance of the log returns of crude oil prices, Zhu et al. [19] introduced a hidden Markov model to obtain the behavior of random events and subjective factors for time series fluctuations. Using a VAR model, Drachal [20] applied the global economic policy uncertainty index, production, volatility index, and crude oil volatility to predict crude oil prices. Despite their simplicity and ease of implementation, these statistical models cannot directly process time series with nonlinear characteristics due to their linear correlation structure. Meanwhile, as the soft computing technology has advanced, many different intelligent algorithms have been developed and widely used in various data predictions. However, conventional statistical and econometric models are constrained by stringent theoretical assumptions, including linearity, stationarity, and dependence on specific distributional properties. As a result, these methods may encounter limitations in accurately forecasting wind power time series that are non-stationary, nonlinear, and characterized by complex dynamics.

Forecasting by Artificial Intelligence and Machine Learning Methods
A crucial presumption in the application of econometric models is that the time series data under study are a linear process. However, crude oil prices do not satisfy this requirement, which can result in less accurate forecasting outcomes. In contrast, various nonlinear intelligence and machine learning methods (e.g., the support vector machine (SVM) proposed by Yu et al. [21] and the extreme learning machine (ELM) proposed by Wang et al. [22]) have emerged to satisfy the requirements, and they can be applied to time series prediction tasks. Moreover, deep learning is gaining popularity in machine learning, since conventional machine learning techniques employ shallow structures. Recently, an artificial neural network (ANN) (see, e.g., Jammazi & Aloui [23]), a back-propagation neural network (BPNN) (see, e.g., Khashei & Bijari [6]), long short-term memory (LSTM) networks (see, e.g., Urolagin et al. [24]), and convolutional neural networks (CNNs) (see, e.g., Li et al. [25]) can implement time series with nonlinear characteristics and have high prediction precision. For example, Wang & Wang [26] created a crude oil price forecasting model that utilized a random Elman recurrent neural network, and the predictive power of the model was analyzed in comparison to other models. Yu et al. [27] incorporated the cutting-edge AI method of EELM into an ensemble model formulation to forecast crude oil prices, and findings showed that the suggested unique ensemble learning paradigm statistically outperformed all investigated benchmark models. However, these models have some drawbacks, including local minima, over-fitting, and a large sample size. While it has been demonstrated that ensemble models can outperform individual models, they are still susceptible to issues such as overfitting and being trapped in local extrema, which can limit their ability to generalize effectively.

Forecasting by Hybrid Models
To overcome the limitations of the aforementioned techniques, hybrid models have been proposed. It is not uncommon for researchers to employ a combination of econometric models and artificial intelligence algorithms or even a combination of econometric models and artificial intelligence algorithms. For example, Cheng et al. [28] predicted crude oil prices in 2018 using the vector error correction and nonlinear auto-regressive neural network (VEC-NAR) model. To enhance the technical indicator-based crude oil price forecasting, He et al. [29] implemented a unique hybrid forecast approach using scaled principal component analysis (s-PCA). In-sample and out-of-sample performance comparisons revealed that the s-PCA model was superior to the compared models. Wang & Fang [30] developed a novel combination of the FNN model and stochastic time effective function for crude oil prices forecasting, i.e., the WT-FNN model, and the findings revealed that the WT-FNN model had the best predictive impact. Zhang et al. [15] offered a novel hybrid technique to predict crude oil prices based on the least square support vector machine, particle swarm optimization, and GARCH model. The experimental findings demonstrated that this approach might accurately estimate crude oil prices. To predict crude oil prices accurately, Wang et al. [31] employed a Markov model to implement the GARCH-MIDAS model for both short-term and long-term state conversion, but they discovered that short-term predictions were more accurate. Like the hybrid approach, our proposed decomposition-ensemble method also takes into account the shortcomings of single models. The biggest difference is that the ensemble learning employs several identical individual methods for ensemble prediction.

Forecasting by the Decomposition-Ensemble Learning Method
Recent studies have established a novel ensemble predicting approach called the decomposition ensemble to manage the challenge of forecasting nonlinear time-series data. Similar to the hybrid method, this approach considers the limitations of single models. Ensemble learning employs multiple identical single techniques for ensemble prediction, whereas the hybrid model employs multiple distinct single models for combination prediction. Oil price predictions typically rely on various significant studies. For example, Li et al. [25] and Li et al. [32] decomposed the monthly crude oil futures price data into multiple modes using VMD. Then, they forecast each mode using a SVM that was optimized by a genetic algorithm and a BPNN that was optimized by a genetic algorithm. Using the Akaike information criterion (AIC) to determine a reasonable lag, Ding [33] proposed a decomposition ensemble model using ensemble empirical mode decomposition (EEMD) for crude oil forecasting. Yu et al. [9] used empirical mode decomposition (EMD) to decompose crude oil prices and the feedforward neural network (FNN) to forecast the components. Zheng et al. [34] recently proposed a method combining an empirical mode decomposition algorithm, quadratic surface support vector regression, and the autoregressive integrated moving average method for the stock indices and future price forecasting. The study obtained better forecasting results than the direct forecasting model. However, the existing literature on constructing the decomposition-ensemble framework has some limitations. It primarily focuses on selecting decomposition-reconstruction-prediction-ensemble methods based on the characteristics of the model, rather than taking into account the characteristics of the data themselves. Therefore, the method proposed in this paper has the ability of selecting appropriate decomposition methods, reconstruction methods, prediction methods, and ensemble methods based on the specific traits of the data.

Recursively Decomposition Method
In this paper, we propose a recursive CEEMDAN-based technique for time series forecasting, which attempts to extract more stable sub-components from rapidly changing components to improve the prediction accuracy. The architecture of the proposed method is given in Figure 1.
The proposed method recursively calls the CEEMDAN decomposition technique (see, e.g., Torres et al. [35]) for each component until it satisfies one of the following two conditions:

•
The component becomes less complex than the given series. • The correlation between the component and the given series exceeds a specified threshold.
The first condition takes into account the sample entropy values of each component. According to the methods proposed by Richman & Moorman [36], the sample entropy value is greater for more complicated components. Therefore, the more complicated components are decomposed again into their own sub-components via CEEMDAN in the algorithm.
The second condition employs Pearson correlation (see, e.g., Hauke & Kossowski [37]) to determine the similarity between the specified component and the series. High correlation is a termination criterion for this recursive method. Recursive decomposition is halted if a sub-component is substantially connected with its higher component regardless of its fluctuation rate.  Then, based on the recursive CEEMDAN algorithm, different decomposed components of the original data and their sub-components are obtained. The decomposed components are identified as low-complexity components when they have smaller complexity traits than the original time series after the first decomposition. The decomposed components with larger complexity traits than the original time series will be recognized as high-complexity components when they are recursively decomposed only once. Then, other decomposed components are recognized as medium-complexity components, which implies that these components have larger complexity traits than the original time series and they will be recursively decomposed two or more times.

Performance Evaluation Criteria
To verify the validity of a forecast, the model outcomes are assessed. Numerous experiments are conducted to evaluate the forecasting performance of the proposed hybrid model and the reference models. In this paper, we use three popular accuracy measures with the following corresponding definitions: where d t and O t are the real and predicted values at time t (t = 1, 2, . . . , N); N is the number of samples in the testing data set; andd t andÔ t are the average values of the actual value and predicted value, respectively. In addition, a Diebold-Mariano (DM) test (see, e.g., Yu et al. [38]) is chosen to prove the superiority of the proposed model. Furthermore, popular single models and several decomposition-reconstruction-ensemble models are built as benchmark models to test the effectiveness of the proposed model. In detail, ES is constructed as the single benchmark model for the traditional econometric model. For AI models, SVR, ELM, and ANN are developed as single benchmark models. As a benchmark model for decomposition-reconstruction-ensemble models, four similar decomposition-reconstruction-ensemble frameworks with different basic prediction models are built.

Research Data
In this paper, the weekly WTI and Brent crude oil spot price from the US Energy Information Administration (EIA) (http://www.eia.doe.gov/ (accessed on 11 August 2022)) were selected as sample data. The sampling period was from 1 January 2010 to 31 December 2021, and there are 627 observations in total. The training set accounts for 70% of the total sample size, which includes 418 observations, and the test set accounts for 30% of the total sample size, which includes 209 observations. The test data set is used to evaluate how well the proposed model performed compared to the benchmark models. Table 1 displays these initial crude oil price series with their statistical measurements, which include the minima, maxima, means, and standard deviations. We find that the rejection of the null hypothesis of Gaussian distribution results from the Anderson and Darling test, which is confirmed by the time series data with nonzero skewness and positive excess kurtosis. Overall, the chosen observations are not stationary, and the model construction should consider necessary data preprocessing.

Experimental Result Analysis
First, the original time series of WTI and Brent crude oil prices are decomposed by CEEMDAN, as shown in Figures 2 and 3. In particular, the price series of WTI and Brent crude oil are decomposed into 8 IMF components and one residual term. Each of the intrinsic mode functions can be categorized into high and low frequencies, with each component showcasing unique characteristics. The decomposition analysis reveals that the residue component exhibits noteworthy long-term trends, while sub-components 1 to 8 are stationary or nearly stationary, as illustrated in Figures 2 and 3. However, the effectiveness of the decomposition process in improving crude oil price forecasting performance remains an open topic for further discussion in subsequent sections.
In the second step, component reconstruction is performed to reduce the computational time complexity. According to Tables 2 and 3, different decomposed modes have different degrees of complexity, and the complexity traits of each decomposed mode show a downward trend with an increasing time scale. Subsequently, based on the recursive CEEMDAN algorithm, all components are recognized as high-complexity components, medium-complexity components, and low-complexity components. More concretely, IMFs and residual components are identified as low-complexity when they have smaller complexity traits than the original time series after the first decomposition. The IMFs with larger complexity traits than the original time series will be recognized as high-complexity components when they are recursively decomposed with only one step. Then, other IMFs are recognized as medium-complexity components. These components have larger complexity traits than the original time series, and they will be recursively decomposed with two or more steps. Tables 2 and 3 report the test results of the complexity traits for each decomposed component of WTI and Brent crude oil prices, respectively.    Next, it is necessary to select a suitable method to predict different components. According to the complexity test results, nine components are reduced into six components after the reconstruction. In addition, the complexity traits of the decomposition components will change when the components change. Based on the reconstruction method, three kinds of components with different degrees of complexity, namely the high-complexity component, medium-complexity component, and low-complexity component, can be obtained. Then, the selection of suitable predictive methods driven by complexity traits is achieved through the trial-and-error approach. Tables 4-9 presents the selection results for predicting different decomposed components of WTI and Brent crude oil prices with complexity traits.
Tables 4-6 show the performance value of different combination models such as X-SVR-SVR, SVR-X-SVR, and SVR-SVR-X. For example, in the X-SVR-SVR model (see Table 4), the second and third SVR methods indicate that the medium-complexity component and lowcomplexity component use the SVR model, while X will try four different methods (i.e., ES, SVR, ELM, ANN) to find a suitable model for the high-complexity component. To facilitate computational convenience, the ADD is temporarily employed as an ensemble method for investigating the correlation between the memorable component and the prediction method. Based on the aforementioned explanations, Table 4 presents the experimental findings regarding the high-complexity components.
For the parameter of the ES, a simple first-order ES with a smoothing constant is chosen. The smoothing constant is determined using the principle of the minimum root mean square error. For the parameters of the SVR model, the Gaussian RBF kernel function is adopted, and the grid search method is used to set the regularization and kernel parameters. For the ELM and ANN models, the number of nodes in the hidden layer is set to 30.
Tables 4-6 illustrate that an ANN is suitable for high-complexity component forecasting, while SVR is suitable for both medium-complexity and low-complexity component forecasting. The ANN-SVR-SVR has better prediction accuracy than other model combinations for WTI crude oil price forecasting. Tables 7-9 show the experimental results of the high-complexity component, medium-complexity component, and low-complexity component, respectively. Similarly, the SVR-SVR-SVR has better prediction performance than other model combinations for Brent crude oil price forecasting according to Tables 7-9.

Prediction Performance Comparison
In this part, the proposed model, four single models (i.e., ES, SVR, ELM, ANN), and four decomposition-reconstruction-ensemble models (i.e., D-R-ES, D-R-SVR, D-R-ELM, D-R-ANN), which are considered benchmark models, are performed to predict the testing dataset of WTI and Brent crude oil prices. Here, "D" denotes the chosen decomposition method, and "R" denotes the proposed reconstruction rule of the component. The results are shown in Tables 10-13. According to these results, the proposed model almost outperforms all of the considered benchmark models. The final form of the proposed model is simply the decomposition-reconstruction-ensemble model with the form of "D-R-SVR" for Brent crude oil price forecasting. Thus, the model with the form of "D-R-SVR" is not considered a target model in Table 13.     Furthermore, the decomposition-reconstruction-ensemble models make predictions better than the single models according to Tables 10 and 11. In particular, for WTI crude oil price forecasting, the decomposition-reconstruction-ensemble models have average MAE, RMSE, and MAPE values of 1.0867, 1.6535, and 0.0345, respectively, while the single models have average MAE, RMSE, and MAPE values of 1.2334, 1.8382, and 0.0408, respectively. For the Brent crude oil price forecasting, the prediction accuracy values for the decomposition-reconstruction-ensemble models are 1.1463, 1.5782, and 0.0233, while those for the single models are 1.3429, 1.9266, and 0.0290. The main reason is that the decomposition-reconstruction-ensemble can minimize the complexity of crude oil data, which boosts its prediction performance compared to benchmark single models.

D-R-ES D-R-ELM
Comparing with the eight benchmark models, i.e., ES, SVR, ELM, ANN, D-R-ES, D-R-SVR, D-R-ELM, and D-R-ANN, the proposed model shows superior performance in crude oil price forecasting. In Table 10, the proposed model improves the prediction accuracy by 59.89%, 63.33%, and 61.82% on average compared to the benchmark single models and by 52.42%, 55.06%, and 53.10% on average compared to the benchmark decomposition-reconstruction-ensemble models. Then, Table 11 shows that the proposed model improves the accuracy of the Brent crude oil price forecasting by 62.01%, 65.88%, and 65.66% on average compared to the benchmark single models and by 52.96%, 51.17% and 51.47% on average compared with the benchmark decomposition-reconstruction-ensemble models. Therefore, the proposed recursive CEEMDAN decomposition-reconstruction-ensemble prediction method can effectively improve the prediction performance of WTI and Brent crude oil prices.
In addition, the DM test is used to compare the prediction performance of different models in the benchmark models in Tables 12 and 13 to statistically prove the superiority of the proposed model for WTI and Brent crude oil price forecasting. These conclusions are statistically proven by data from the DM test, as indicated by the p-values (in brackets). First, at a significance level of 5%, the proposed model outperforms all benchmark models, which suggests that the proposed recursive CEEMDAN decomposition-reconstruction-ensemble prediction model is better than the listed benchmark models for WTI and Brent crude oil price forecasting. Second, when the decomposition-reconstruction-ensemble models in the benchmark models are tested as the target models in Tables 12 and 13, only the D-R-SVR can be proven to be better than all single models with the significance level of 5%. Third, focusing on different decomposition-reconstruction-ensemble models in the benchmark models, although the D-R-SVR can be statistically demonstrated to be better than their D-R-based counterparts at the confidence level of 5%, it is essential to choose the appropriate prediction model for the reconstructed components with different degrees of complexity.

Further Discussion
In this section, we perform the EEMD decomposition method and two different reconstruction rules to compare the prediction performance of the proposed model. The two rules are mode reconstruction based on the threshold setting of SE (see, e.g., Zhang et al. [39]) and fine-to-coarse (FTC) (see, e.g., Yu et al. [38] and Zhang et al. [39]). Different models are performed as the benchmark models, which are denoted in the form of R-D-R-SA, where "R-D" indicates different recursive decomposition methods to be compared, "R" indicates different reconstruction rules, and "SA" represents the selected predictive methods driven by the complexity traits and simple addition for the final ensemble. Tables 14 and 15 and Figures 4 and 5 show the results of different models. Similarly, the DM test is performed to evaluate the accuracy of different prediction models, and the corresponding results are presented in Table 16. According to Tables 14-16 and Figures 4 and 5, the main findings are as follows.
First, as Tables 14 and 15 show, no model can outperform other models under all indicators. Compared with the EEMD decomposition-based models, the proposed model for WTI crude oil price forecasting improves the prediction accuracy by 10.10%, 13.28%, and 11.35% on average, and the proposed model for Brent crude oil price forecasting improves the prediction accuracy by 17.27%, 21.0%, and 16.50% on average. One possible reason is that CEEMDAN minimizes the complexity of WTI and Brent crude oil price data. Thus, it can effectively filter out the meaningful components and significantly enhance the forecast accuracy.
Second, the proposed model is better than the benchmark models based on other reconstruction rules. In concrete, compared with the benchmark models with different reconstruction rules, the proposed model for WTI crude oil price forecasting improves the prediction accuracy by 3.51%, 6.75%, and 4.59% on average, and the proposed model for Brent crude oil price forecasting improves the prediction accuracy by 7.78%, 9.40%, and 7.05% on average. Table 16 also shows that the DM test at the 10% level of significance confirms the superiority of the suggested model. Thus, the WTI and Brent crude oil data can be better predicted using the proposed reconstruction approach based on the complexity trait.
Third, the proposed model has lower MAE, RMSE, and MAPE than other models based on the EEMD decomposition models and reconstruction rules from Figures 4 and 5. For example, compared with different reconstruction methods in the benchmark models, the proposed model for WTI crude oil price forecasting improves the prediction accuracy by 10.89%, 16.03%, and 12.75% on average, and the proposed model for Brent crude oil price forecasting improves the prediction accuracy by 20.04%, 24.32%, and 18.84% on average. Thus, the proposed model improves the prediction performance in WTI and Brent crude oil price forecasting. Meanwhile, as shown in Table 16, when the proposed model is used as the target model, all p-values of the DM test fall below the threshold of 10%, so the proposed model has a significantly higher level of accuracy in its predictions than the benchmark models.

Conclusions and Future Directions
This paper proposes a new complexity-traits-driven recursively CEEMDAN decomposition-reconstruction-ensemble method for WTI and Brent crude oil price forecasting. All steps of component reconstruction for decomposed components, component prediction, and ensemble prediction are driven by the complexity traits, and the proposed method proves to be more effective than the benchmark models.
In the empirical analysis, the proposed recursive CEEMDAN decompositionreconstruction-ensemble learning paradigm is significantly better than the most popular single models, different decomposition-reconstruction-ensemble models, and ensemble models based on the EEMD decomposition methods or different reconstruction rules. Based on the empirical experiments, four insightful conclusions can be summarized.
First, the prediction accuracy of WTI and Brent crude oil price data demonstrates that the proposed model outperforms all benchmark models. Specifically, compared with different benchmark models, the proposed model for WTI crude oil price forecasting improves the prediction accuracy by 56.16%, 59.19%, and 57.46% on average, and the proposed model for Brent crude oil price forecasting improves the prediction accuracy by 57.48%, 58.53%, and 58.56% on average. Therefore, the proposed model can be a useful tool to forecast WTI and Brent crude oil prices in the near future.
Second, CEEMDAN can achieve better prediction performance than the EEMD decomposition-based method. For example, compared with the EEMD decomposition-based models, on average, the proposed model improves the prediction accuracy by 10.10%, 13.28%, and 11.35% for WTI crude oil price forecasting and by 17.27%, 21.0%, and 16.50% for Brent crude oil price forecasting.
Third, the prediction performance of crude oil price data can be further improved by selecting appropriate prediction models for the reconstructed components with different degrees of complexity. For example, compared with the benchmark decomposition-reconstruction-ensemble models (i.e., D-R-KRR, D-R-ELM, D-R-SVR, and D-R-ANN), on average, the proposed model improves the prediction accuracy by 52.42%, 55.06%, and 53.10% for WTI crude oil price forecasting and by 52.96%, 51.17%, and 51.47% for Brent crude oil price forecasting. Therefore, it is essential to choose the appropriate prediction models according to the complexity traits.
Finally, compared with the existing reconstruction rules, the recursively decompositionreconstruction method based on the complexity traits can reduce the modeling complexity well, which shows its usefulness and efficacy in WTI and Brent crude oil price forecasting. For example, on average, the proposed model improves the prediction accuracy by 10.89%, 16.03%, and 12.75% for WTI crude oil price forecasting and by 20.04%, 24.32%, and 18.84% for Brent crude oil price forecasting. Thus, mode reconstruction driven by complexity traits is effective.
In addition to the sample entropy used by our recursive CEEMDAN method, other time series features such as the frequency change rate and autocorrelation can be used. Future research extensions will focus on the following: (1) verifying more advanced decomposition methods under the proposed framework in this paper and (2) exploring more results in other research areas such as the stock market, power market, and other emerging markets using the proposed complexity-trait-driven reconstruction-ensemble learning paradigm.